Hitting, Pitching, Fielding, WAR, and Team Performance Predictions
Author
Jose Ayala
Published
October 14, 2025
1 Crossroad League Analysis
This project analyzes player performance in the Crossroads League baseball season, focusing on both hitting and pitching statistics. Using real game data, the analysis identifies the top performers across all teams, providing interactive visualizations and tables to explore player rankings.
For hitters, the project highlights the, top 10 players per team based on key offensive metrics such as batting average, on-base percentage, slugging percentage, and OPS. For pitchers, it identifies th top 10 starting pitchers in the league(minimum seven games started), ranked by earned run average (ERA), strikeouts, and WHIP.
We are using interactive charts, dropdown selections, and sortable tables, users can easily compare teams, evaluate individual player performances, and discover insights into which players had the most impact during the season.
1.2.1 Top 10 Hitters in the Crossroads League (AVG)
What it shows: A vertical bar chart ranking the ten players with the highest batting averages. Hover reveals hits, AB, HR and OBP. How to read it: Taller bars = higher batting average. Use the hover text to compare opportunities (AB) so you don’t overvalue small-sample rates. Interpretation tip: Combine AVG with OBP/OPS to identify hitters who both hit for average and get on base. Note sample cutoff (AB ≥ 60) eliminates small-sample noise.
What it shows: Grouped bars where each stat is scaled to 0–1 so different units are comparable. How to read it: Within each player group, taller bars indicate relatively stronger performance for that metric. This reveals whether a player’s value comes from contact, power, or getting on base. Interpretation tip: Use this to spot players who are balanced across metrics versus specialists (e.g., high HR but lower OBP).
What it shows: Players ranked by combined on-base and slugging performance. Hover shows AVG, OBP, SLG, HR, RBI, AB. How to read it: OPS is a simple composite of on-base skills and power, higher is better. Compare with AVG chart to see if top AVG hitters also provide power/plate discipline. Interpretation tip: OPS is a good offensive summary but doesn’t capture defense or baserunning.
What it shows: For whichever team is selected, the chart shows the five players with the highest OPS on that roster. Hover reveals supporting stats. How to read it: Use the dropdown to compare how teams are constructed offensively (a single player vs. balanced lineup). Interpretation tip: Helpful for scout-style comparisons; check AB to ensure pecking order reflects regular playing time.
1.2.5 Comparison of top 5 Hitters ( Choose two teams)
What it shows: Grouped bars comparing the two teams’ top hitters on OPS, with hover details. How to read it: Choose any pair of teams to directly compare their best hitters. Look for cross-team advantages in OPS depth. Interpretation tip: Useful when arguing which team has the stronger top lineup or more depth.
1.2.6 Top Hitters in the League By WAR (Wins Above Replacement)
What is WAR?
In baseball, WAR stands for Wins Above Replacement, a statistic that quantifies a player’s overall value to their team in terms of wins. It compares a player’s performance across all aspects of the game (hitting, fielding, base running) to a hypothetical “replacement-level” player, determining how many additional wins their production contributes compared to that baseline.
What it shows: Table ranking hitters by WAR_est = (OPS - league_avg_OPS) / 0.12. How to read it: Higher WAR_est = more value vs a league-average hitter. Read the method note: this is a classroom proxy, not official WAR. Limitations: No park/position/defense adjustments here, use blended WAR for fuller view.
What it shows: Two-bar chart comparing a chosen player’s OPS with the league mean; shows percentile and delta. How to read it: Use to highlight an individual’s offensive standing. Hover shows AB and percentile. Interpretation tip: Useful for spotlight slides in a presentation.
Show Code
library(readr)library(dplyr)library(stringr)library(plotly)#playerplayer_name <-"Kaleb Kolpien"#change this to any playerab_min <-60hitters <-read_csv("~/Downloads/baseball_data-2.csv", show_col_types =FALSE) %>%mutate(Name =str_squish(Name),team =str_squish(team),ab =as.numeric(ab),obp =as.numeric(obp),slg =as.numeric(slg),OPS = obp + slg )league_pool <- hitters %>%filter(!is.na(OPS), ab >= ab_min)if (!player_name %in% league_pool$Name) {stop(paste0("Player '", player_name, "' not found with ab >= ", ab_min,". Try another name or lower ab_min."))}player_row <- league_pool %>%filter(Name == player_name) %>%arrange(desc(ab)) %>%slice(1)league_avg_ops <-mean(league_pool$OPS, na.rm =TRUE)ops_ecdf <-ecdf(league_pool$OPS)player_pct <-round(100*ops_ecdf(player_row$OPS), 1)comp <-tibble(Label =c(paste0(player_row$Name, " (", player_row$team, ")"), "League Avg"),OPS =c(player_row$OPS, league_avg_ops)) %>%mutate(Label =factor(Label, levels = Label))#chartp <-plot_ly( comp,x =~Label, y =~OPS, type ="bar",text =~paste0("OPS: ", round(OPS, 3),ifelse(Label ==levels(Label)[1],paste0("<br>Team: ", player_row$team,"<br>AB: ", player_row$ab,"<br>Percentile: ", player_pct, "th","<br>Δ vs Lg: ", sprintf("%+.3f", player_row$OPS - league_avg_ops)),"") ),hoverinfo ="text") %>%layout(title =paste0("OPS Comparison: ", player_row$Name, " vs League (AB ≥ ", ab_min, ")"),xaxis =list(title ="", tickangle =0),yaxis =list(title ="OPS"),shapes =list(list(type ="line", x0 =-0.5, x1 =1.5, xref ="x",y0 = league_avg_ops, y1 = league_avg_ops, yref ="y",line =list(dash ="dash", width =2) )),annotations =list(list(x =1.05, y = league_avg_ops, xref ="paper", yref ="y",text =paste0("League Avg: ", round(league_avg_ops, 3)),showarrow =FALSE, xanchor ="left" ),list(x =0, y =max(comp$OPS, na.rm =TRUE), xref ="x", yref ="y",text =paste0("Percentile: ", player_pct, "th<br>Δ OPS: ",sprintf("%+.3f", player_row$OPS - league_avg_ops)),showarrow =FALSE, xanchor ="left", yanchor ="bottom" ) ) )p
1.2.8 Hitters vs League Average (Multiple Categories)
What it shows: Grouped bars (one player vs league) for the four metrics; a dropdown selects the player. How to read it: Quickly assess whether a player’s value is driven by average, on-base ability, or power.
What it shows: Bar chart ranking starters by ERA with hover showing IP, K, W, WHIP and GS. How to read it: Lower ERA is better; use IP and WHIP context to judge durability/efficiency. Interpretation tip: ERA alone can be influenced by defense and other aspects of the game, look at WHIP and K/9 to corroborate performance.
What it shows: Table ranking pitchers by WAR_est derived from ERA and innings pitched. How to read it: This gives a quick value estimate for starters; note the simplified math and describe limitations in the write-up.
What it shows: Bar chart showing each team’s mean fielding percentage; hover shows total errors and chances. How to read it: Higher fielding % generally indicates fewer costly miscues, but consider total chances (teams with more chances may have different contexts). Interpretation tip: Fielding% ignores range/advanced defensive value, combine with error rate and positional context.
1.4.2 Chart 2, Player-level Errors vs Total Chances
What it shows: Each point is a player; the x-axis is total chances (workload), y-axis is errors, color encodes fielding % and size encodes games played. How to read it: Players with high chances and low errors are strong defenders; outliers with high errors for many chances warrant further review. Interpretation tip: Use this to highlight high-usage defensive liabilities or standout regulars.
datatable( fielding %>%select(name, team, pos, gp, tc, po, a, e, f_pct, err_rate) %>%arrange(team, desc(gp)),caption ="Fielding table (use filters to explore)",options =list(pageLength =15, scrollX =TRUE),rownames =FALSE)
1.5 Team Analysis
1.5.1 Top 25 Players in the Crossroads League
What it shows: Interactive table of WAR components and a bar chart of the top 25 players by WAR_est_mix. Hover shows OffRuns / FieldingRuns / PosAdjRuns breakdown. How to read it: Higher WAR_est_mix = more wins contributed above replacement-level proxy. Use the component breakdown to see whether value is offensive or defensive. Important note: This is a experiment/approximation. Not perfectly due to lack of other stats.
1.5.2 Wins Predictor Using Pythagorean Expectation
What it shows: Table and bar chart of predicted wins for each team derived from estimated runs scored (RBI proxy) and runs allowed (ERA/IP). How to read it: Higher bars = higher expected wins. Use this to compare how predictive runs align with WAR and TeamPower.
What it shows: Each point is a team; hover reveals OPS, HR, and RBI totals. How to read it: Upper-right quadrant = teams that both get on base and hit for power (most dangerous offensively).
1.5.4 Team Pitching Efficiency
What it shows: Each point is a team; hover reveals ERA, strikeouts, wins and WHIP. How to read it: Lower ERA with high strikeouts indicates strong run prevention and swing-and-miss ability
#chart2team_pitching <- pitching %>%group_by(team) %>%summarise(avg_era =mean(era, na.rm =TRUE),total_k =sum(k, na.rm =TRUE),total_wins =sum(w, na.rm =TRUE),avg_whip =mean(whip, na.rm =TRUE),.groups ="drop" )p_pitching <-plot_ly( team_pitching,x =~avg_era,y =~total_k,type ="scatter",mode ="markers+text",text =~team,textposition ="top center",marker =list(size =12, color ="darkorange"),hovertext =~paste0("<b>", team, "</b><br>","ERA: ", round(avg_era, 2),"<br>Strikeouts: ", total_k,"<br>Wins: ", total_wins,"<br>WHIP: ", round(avg_whip, 2) ),hoverinfo ="text") %>%layout(title ="Team Pitching Efficiency — ERA vs Total Strikeouts",xaxis =list(title ="Average ERA (lower is better)"),yaxis =list(title ="Total Strikeouts (higher is better)") )p_pitching
1.5.5 Team Power Index
What it shows: Ranked bar chart and table showing a single composite score that blends offense and pitching performance. Hover shows components and z-scores. How to read it: Higher index = stronger overall team. Use the table to see whether teams are driven by offense or pitching.
Data are from the Crossroads League season CSVs provided (hitters, pitchers, fielding). All analyses reflect the season as recorded in those files and assume the data are final and cleaned.
The project uses a50-game college season as the reference for season-level scaling (for example, positional adjustments are prorated to 50 games).
Key calculations & shortcuts
OPS = OBP + SLG (computed where not present).
WAR_estimates are classroom proxies:
Hitting WAR proxy: OPS-based (OPS − league_avg_OPS) / 0.12.
Pitching WAR proxy: ERA-based model that scales by innings (not official WAR).
Blended WAR (WAR_est_mix) combines Offensive Runs (OPS-based), Fielding Runs (error-rate vs. position), and a positional adjustment; final runs = wins using 10 runs ≈ 1 win.
Team win predictions use the Pythagorean expectation with exponent =1.83, scaled to a 50-game season; runs scored are estimated from hitting aggregates (e.g., RBI proxy), and runs allowed from pitching (ERA × IP / 9).
Assumptions & limitations
Many metrics here are approximations . They omit advanced adjustments that professional metrics use (park factors, base-running, defensive range, bullpen context, replacement-level baselines, play-by-play event weighting, etc.).
Fielding value is approximated from error rates and fielding percentage; these measures do not capture range, defensive runs saved, or subtle positioning differences.
Using RBI as a proxy for team runs scored is imperfect (RBI undercounts some run contexts), treat the Pythagorean predictions as indicative, not definitive.
Small sample sizes: players with limited AB or IP can produce unstable rate stats. Many charts apply simple qualifiers (e.g., AB ≥ 60, GS ≥ 7) to reduce noise.
3 Key Terms and Metrics
Below is a short glossary of key baseball and analytical terms used throughout this project.
It provides quick definitions for both baseball-specific and statistical metrics.
Term
Definition
AB (At-Bats)
The number of official batting attempts, excluding walks, sacrifices, or hit-by-pitch.
AVG (Batting Average)
Hits divided by at-bats, measures how often a player gets a hit.
OBP (On-Base Percentage)
How often a player reaches base (via hits, walks, or hit-by-pitch).
SLG (Slugging Percentage)
Measures the total number of bases per at-bat , shows a hitter’s power.
OPS (On-base Plus Slugging)
OBP + SLG, a combined measure of a hitter’s ability to get on base and hit for power.
RBI (Runs Batted In)
The number of runners who score because of a player’s hit, walk, or sacrifice.
HR (Home Runs)
The number of times a player hits the ball and scores by circling all bases in one play.
XBH (Extra Base Hits)
Total of doubles, triples, and home runs, hits that go for more than one base.
ERA (Earned Run Average)
The average number of earned runs a pitcher allows per 9 innings pitched. Lower = better.
WHIP (Walks + Hits per Inning Pitched)
Measures how many baserunners a pitcher allows per inning; lower values indicate better control.
IP (Innings Pitched)
Total innings a pitcher has thrown. One inning = three outs.
GS (Games Started)
The number of games in which a pitcher was the first to appear for their team.
WAR (Wins Above Replacement)
An overall measure of a player’s value in wins compared to a replacement-level player.